Public Interest::Data Ethics &
Practice
Factors are variables which take on a limited number of values, aka categorical variables. In R, factors are stored as a vector of integer values with the corresponding set of character values you’ll see when displayed (colloquially, labels; in R, levels).
parcels %>% count(use_code_reduced) # currently a character
parcels %>%
mutate(use_code_reduced = factor(use_code_reduced)) %>% # make a factor
count(use_code_reduced)
# assert the ordering of the factor levels
use_levels <- c("Apartment", "Du-Tri-Quadplex", "Condominium", "Single Family")
parcels %>%
mutate(use_code_reduced = factor(use_code_reduced, levels = use_levels)) %>%
count(use_code_reduced)
The forcats
package, part of the tidyverse, provides helper functions
for working with factors. Including
Joins merge data sets based on key variables. The syntax is always
name_join(x, y, by = "key")
Animated visuals created by Garrick Aden-Buie
full_join(): keeps all observations in x and yleft_join(): keeps all observations in xright_join(): keeps all observations in yinner_join(): keeps observations in both x and yseparate: Split a single column into multiple columns by
separating each cell in the column into a row of cells.
separate(df, col = rate, into = c("cases", "pop"), sep = "/")
unite: Combine several columns into a single column by
uniting their values across rows.
unite(df, col = year, century:year, sep = "")
pivot_longer: Convert wide data to long, or move
variable values out of the column names and into the cells.
pivot_longer(df, cols = -country, names_to = "year", values_to = "cases")
pivot_wider: Convert long data to wide, or move variable
names out of the cells and into the column names.
pivot_wider(df, id_cols = country, names_from = type, values_from = count)
Go to slack and copy the practice script for today (learningRweek9.R). Then open an RStudio session using the learningRweek9.Rproj file.